This is a follow-up to our lab conversation. The aim is to introduce using ggplot2 to create visuals in R. To minimze the frustrating and maximize the helpful. We will use a dataset from the State of Alaskan Salmon and People project and available from the Knowledge Network for Biocomplexity.

While some familiarity with R can be useful in understanding what is happening within the ggplot2 package, this workshop will focus on introducing the functionality of ggplot2 using pre-cleaned datasets.

If you do not already have R and R studio available on your computer you can find directions to downloading it here:[R] https://cran.rstudio.com/ and [R Studio] https://www.rstudio.com/products/rstudio/download/#download.

Using ggplot2 to create visuals

Now that we have the dataframe loaded and are familiar with what all is included, we can start to explore using ggplot2 to create plots and visuals.

As a basic introduction, ggplot2 uses a specific grammar of graphics (thus the gg) that relies on layers. The layers in ggplot2 are ordered as: data, mapping, geom, stat, position, scale, coord, and facet and we will walk through adding and altering each layer using the above data.

Data & Mapping

The definition of the data and which variables map onto which aesthetics is required by ggplot2. Thankfully, it is fairly straightforward after you have decided which variables you are interested in.

Here I have it set to the per capita harvest of salmon for subsistence (y) by latitude (x)

plot1<-ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest))

plot1

At this point, we can see that we have the variables on the correct axises and the scales have been set, but there are no data being displayed. If we want, we can see the information that ggplot is holding about plot1 by entering View(plot1).

The next step is to decide how we want to visualise the data and let ggplot know so that it can project the data onto the coordinates we just set up.

Geom

We accomplish this through the geom layer. There are many geoms available for using with ggplot2, each comes with its own preset functionalities and it takes some consideration of the type of data you are working with: is it continuous? categorical? and the relationships you are interested in exploring. This site provides a tool to help you walk through what vistuals might be a good fit with your data and story: https://www.data-to-viz.com/#explore.

A handy spot to gain familiarity with the range of geoms available is: https://ggplot2.tidyverse.org/reference/#section-layer-geoms.

With these options and the variables that we chose, what geom would you be interested in trying?

Here are some examples

plot2<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point()
plot2

plot3<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_bin2d()
plot3

# Instead of redefining the data we want to use each time, we could also call plot1

plot4<- plot1 +
  geom_smooth(method="glm")
plot4

# This adds a line based on a generalized linear model ("glm") with a confidence interval

# We can also combine multiple geoms

plot5<-ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point() +
  geom_smooth(method="glm") +
  geom_text(aes(label=city), check_overlap = TRUE)
plot5

# Which is not the clearest, but gives you an idea of how you can add more information quickly

Those are useful geoms when both our variables are continuous, but what if we have a categorical variable on the x-axis, like region?

We start in the same way, by defining the data and mapping, then use different geoms to visualise:

plot6<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest))

plot7<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_point()
plot7

plot8<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_boxplot()
plot8

plot9<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_violin()
plot9

But what if we want to look at and start to compare the sum or average at the region level?

Stat

That is where stat comes in. Each geom has a default stat, in the geom_point example identity was used. That means that the plot will use the value in the cell to determine where to place the point or f(x)=x.

If we want to look at the average value for y for each x, we can set the stat to summary and specify that we want the plot to display the mean y value for each x or fun.y="mean".

plot10<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_point(stat="summary", fun.y="mean")
plot10

# we can also use different geoms now that might have made less sense when visualising each community's data

plot11<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(stat="summary", fun.y="mean")
plot11

# finally we can combine these together if we want to have both the summary and individual community data

plot12<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(stat="summary", fun.y="mean") +
  geom_point()
plot12

Position

In this plot we have some points that are overlapping, because they are all aligned to the center of the x-variable. Adjusting the position can help us see these better.

# We can use position_jitter to add random "noise" to the x-value to reduce the overlap

plot13<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(stat="summary", fun.y="mean") +
  geom_point(position=position_jitter(width=0.1))
plot13

# This will add more "noise"

plot14<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(stat="summary", fun.y="mean") +
  geom_point(position=position_jitter(width=0.4))
plot14

Position is also how we can create stacked bar plots. For example, if we were interested in seeing how each community contributed to regional subsistence harvests, we could use a stacked bar plot.

plot15<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) 

plot16<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_col(position=position_stack())
plot16

# Which we can change the colours for to get a better visual
plot17<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_col(position=position_stack(), 
           colour="#769fad", 
           fill="white")
plot17

Here use fill and colour to set specific colours that we prefer to use for all the data. Note in this case fill and colour are not inside the aes() because they are not being defined by the data.

What if we want the colours to be reflective of the data, for example different colours by amount harvested?

Scale

We can use scale to define the colours and how they should be assigned based on the data.

For categorical data, discrete colour scales are recommended, for continous data gradiant colour scales are recommended. Here colour and fill will need to be within aes()

plot18<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(colour=subs_lbs_per_capita_harvest),
             position=position_jitter(width=0.2)) +
  scale_colour_continuous()
plot18

plot19<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(colour=region), 
             position=position_jitter(width=0.2)) +
  scale_colour_discrete()
plot19

# Scaling can also be used to assign different shapes, line type, size, and transparency.

plot20<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(shape=region, colour=region),
             position=position_jitter(width=0.2)) + 
  scale_colour_discrete() +
  scale_shape_manual(values=c(1:13))
plot20

plot21<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(colour=region, size=total_2016),
               position=position_jitter(width=0.2)) +
  scale_colour_discrete() +
  scale_size_continuous(range=c(1,20))
plot21

For colouring, some geoms, like geom_bar, allow fill and colour to be defined.

plot22<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(aes(colour=region), 
           stat="summary", 
           fun.y="mean") +
  scale_colour_discrete()
plot22

plot23<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(aes(fill=region), 
           stat="summary", 
           fun.y="mean") +
  scale_colour_discrete()
plot23

Coord

The coordinate system in all these examples has been defaulting to a Cartesian coordinate system, which is the most common for two dimensions. This will usually be sufficient. However, if you are interested in creating pie charts or spider-plots, you will need to change the coordinate system.

plot24<- ggplot(wellbeing.ak, aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(aes(fill=region), 
           stat="summary", 
           fun.y="mean") +
  scale_colour_discrete() +
  coord_polar()
plot24

There are a lot of opinions about what makes a good plot, and they seem to be especially strong around changing the coordinate system. While ggplot2 helps making good plots easier, you can certainly still make bad plots.

Here is a helpful spot to check for more discussion on what might be considered as good and bad: https://www.data-to-viz.com/caveats.html.

Facet

The final layer is to define the facet, which actually works by redifining the coordinate system, but we don’t really have to worry about that. Facet allows us to create multiple, smaller plots so we can explore if relationships are common across other variables.

plot25<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(size=total_2016), 
             colour="#0f3c52", 
             alpha=0.5,
             position=position_jitter(width=1)) +
  scale_size_continuous(range=c(2,20)) +
  facet_wrap(vars(region))
plot25

You can see how we can easily start to explore a lot of information on the same plot. And this is really what I have found to be the strength of ggplot2.

Moving from any of the plots above to something you might want to include in a presentation, report, or article takes a bit more fiddling. The good news is that for a lot of what you might want to do, someone has already done the fiddling for you (e.g. themes). And if they have not, or you want to make further changes you can do that once and then save it as a function that you can easily call for future plots and even other projects.

Making things pretty –but also readable

If you are like me, this is where you can end up spending a lot of time. The best way to save yourself some time is to use themes that others have already created. These are easy to use and let you quickly set a lot of the formatting and colours.

Themes

plot26<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(size=total_2016), 
             colour="#0f3c52", 
             alpha=0.5,
             position=position_jitter(width=1)) +
  scale_size_continuous(range=c(2,20)) +
  facet_wrap(vars(region)) +
  theme_bw()
plot26

plot27<- plot25 +
  theme_dark()
plot27

plot28<- plot25 +
  theme_minimal()
plot28

To modify components of a theme use theme().

# Here we can move the legend to the bottom
plot29<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(size=total_2016), 
             colour="#0f3c52", 
             alpha=0.5,
             position=position_jitter(width=1)) +
  scale_size_continuous(range=c(2,20)) +
  facet_wrap(vars(region)) +
  theme_minimal() +
  theme(legend.position="bottom")
plot29

Labels

In these examples, labels are a spot where we can make a lot of improvements. We have x and y variables that are not described very well, and no titles. Most of these can be adjusted by adding a labs() layer.

plot30<-ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(size=total_2016), 
             colour="#0f3c52", 
             alpha=0.5,
             position=position_jitter(width=1)) +
  scale_size_continuous(range=c(2,20)) +
  facet_wrap(vars(region)) +
  theme_minimal() +
  theme(legend.position="bottom") + 
  labs(x="Latitude", 
       y="Pounds per Capita", 
       size="Total Population, 2016")

plot30

# note that we can use \n to create a line break

# We can also use labs() to insert a Title, Subtitle, and Caption
plot31<-plot30 +
  labs(title="Salmon Harvested for Subsistence by Latitude", 
       subtitle="At the community level, across regions", 
       caption="Use of subsistence salmon from ADF&G surveys.\n Population from US Census, 2016.") +
  theme_minimal() +
  theme(legend.position = "bottom")
plot31

A note about labelling is that, so far, I have had a tough time creating labels using non-Latin alphabet characters. There are ways to do it and the closest that I have come to a solution is using packages developed by linguists. If this is something you might want to do, I’m happy to share what I have found in my searches. As a plug for language justice extending to software languages these two projects might be of interest:

https://www.cbc.ca/radio/unreserved/indigenous-language-finding-new-ways-to-connect-with-culture-1.4923962/a-hawaiian-team-s-mission-to-translate-programming-language-to-their-native-language-1.4926124

https://www.alaskapublic.org/2021/04/13/yupik-engineers-team-up-to-build-apps-for-yugtun-language-learning/*

Colours

There is really no limits here on what you can do with colours. There are some “rules of thumb” and some accessibility considerations to keep in mind, along with the format you will be using your plots for (print vs online, journals that accept coloured plots, some journals have colour-schemes…).

More about colours:

https://davidmathlogic.com/colorblind/

https://ggplot2-book.org/scale-colour.html

Some preset colours are available in ggplot2, and you can also use the package RColorBrewer to access additional predefined pallettes.

library(RColorBrewer)

# You will need to consider what type of data you are asking ggplot to scale by. If continuous, you will likely want to use a gradient; if categorical (not ordered) you will want to use discrete colours (e.g. scale_colour_discrete, scale_colour_brewer). 

plot32<- ggplot(wellbeing.ak, 
                aes(x=region, y=subs_lbs_per_capita_harvest)) +
  geom_bar(stat="summary", 
           fun.y="mean") +
  geom_point(aes(colour=lat), 
             position=position_jitter(0.2)) +
  scale_colour_viridis_b(direction=-1) 
plot32

plot33<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(colour=region, size=total_2016),
               position=position_jitter(width=0.2)) +
  scale_size_continuous(range=c(1,20)) +
  scale_colour_brewer(palette = "Set2") +
  theme_minimal()
plot33

# We can see here that the palette chosen does not have enough discrete values to cover all the regions in our dataset --this is an indication that it will be tough to tell different colours apart once you get above ~8 categories. Pairing colours with labels or shapes, and making sure similar colours are not adjacent becomes more important as you increase the number of categories and colours used.

# If we want to set our own values, we can create a pallette, making sure there are enough values for the variable we are planning to use

cbPallette=c("#cc6677", "#332288", "#ddcc77", "#117733", "#88CCEE", "#882255", "#44AA99", "#999933", "#AA4499", "#555555", "#EE7733", "#44BB99", "#004488")

plot34<- ggplot(wellbeing.ak, aes(x=lat, y=subs_lbs_per_capita_harvest)) +
  geom_point(aes(shape=region, 
                 colour=region, 
                 size=total_2016),
             position=position_jitter(width=0.2)) +
  scale_shape_manual(values=c(1:13)) +
  scale_colour_manual(values=cbPallette) +
  scale_size_continuous(range=c(1,20))

plot34

GGSave

A nice part about RStudio is that you can see the plots and get a sense of what they look like. However, it is important to know that they are just temporary, once you close your session they will no longer be available to you, unless you save them. You can save ggplot items as .pdf, .png, .jpg and other formats using ggsave.

ggsave(plot34, file="lat.harvest.region.png", dpi=400, width=10.5, height=6)

Patchwork

The above introduced the basics of ggplot2 and how to use it to generate plots using layers. There are additional packages that can be helpful to polish and use the ggplot objects you have created. Here we will look at patchwork, above I used Rcolourbrewer, and others include ggpubr, ggnetwork, and ggmap.

Patchwork is really useful if you want to work with the composition of your plots, and especially if you are looking to combine multiple plots, or other objects, in one figure.

library(patchwork)

plot31+plot32

# We can adjust the default arrangement by specifying layout, there are a few ways to do this. We can use *plot_layout* to specify the number of rows or columns, we can also use the operators of / or | instead of + . / will stack plots vertically, | will place them next to each other.

plot31 + plot32 + plot_layout(nrow=2, byrow=FALSE)

plot31 / plot32

# We can also continue to edit our ggplot2 objects. For this, the last plot listed (in our example, plot32) is considered the *active* plot and the specifications will be applied to it.

plot31 | plot32 + 
  labs(title="Salmon Harvested for Subsistence by Region", 
       subtitle = "At the community level, across regions", 
       x="SASAP Region", 
       y="Pounds per Capita", 
       colour="Latitude") + 
  theme(legend.position = "bottom",
        axis.text.x = element_text(angle = 45, hjust=1))

# To make those changes retrievable later, we need to create a new plot

plot33<-plot32 +
  labs(title="Salmon Harvested for Subsistence by Region", 
       subtitle = "At the community level, across regions", 
       x="SASAP Region", 
       y="Pounds per Capita", 
       colour="Latitude") +
  theme(legend.position = "bottom", 
        axis.text.x = element_text(angle = 45, hjust=1))
plot33

# There are some additional changes we might want to make to ensure things are readable, but this gives us a good sense of what patchwork can let us do. Finally, pathwork can be used with ggplot objects that are not plots. This can by handy for including tables, images and graphics, matrices, and maps. The trick is that these objects have to be converted to ggplot objects--if they are not already, and this can require using additional packages.

library(magick)
library(ggpubr)

map <- image_read("SASAPRegions.png")
map1 <- ggplot() +
  background_image(map) + coord_fixed(ratio=1)

combo1<-plot33 | map1 +
  labs(caption="Use of subsistence salmon from ADF&G surveys.\n(Clark et al. 2018)")
combo1

ggsave(combo1, file = "combo.png", dpi = 700, width=12.35, height=5.78)

Shiny Apps

And then finally, I wanted to point you to the shiny package available with R, which provides a web application framework. I am just starting to explore this, but you can go to the shiny app gallery to see how shiny can be used with ggplot2 to allow people to adjust input data and have compelling visuals returned. Here is one example:

https://shiny.rstudio.com/gallery/soil-profiles.html

Additional Resources

https://psu-psychology.github.io/r-bootcamp-2019/talks/ggplot_grammar.html

https://ggplot2.tidyverse.org/reference/

https://codewords.recurse.com/issues/six/telling-stories-with-data-using-the-grammar-of-graphics

Citations

Jeanette Clark, Rachel Donkersloot, Courtney Carothers, and Jessica Black. Sample indicators of well-being in Alaska. Knowledge Network for Biocomplexity. doi:10.5063/F1XK8CVG.

## 
##   Wickham et al., (2019). Welcome to the tidyverse. Journal of Open
##   Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Welcome to the {tidyverse}},
##     author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
##     year = {2019},
##     journal = {Journal of Open Source Software},
##     volume = {4},
##     number = {43},
##     pages = {1686},
##     doi = {10.21105/joss.01686},
##   }
## 
## To cite package 'patchwork' in publications use:
## 
##   Thomas Lin Pedersen (2019). patchwork: The Composer of Plots. R
##   package version 1.0.0. https://CRAN.R-project.org/package=patchwork
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {patchwork: The Composer of Plots},
##     author = {Thomas Lin Pedersen},
##     year = {2019},
##     note = {R package version 1.0.0},
##     url = {https://CRAN.R-project.org/package=patchwork},
##   }
## 
## To cite package 'ggpubr' in publications use:
## 
##   Alboukadel Kassambara (2020). ggpubr: 'ggplot2' Based Publication
##   Ready Plots. R package version 0.2.5.
##   https://CRAN.R-project.org/package=ggpubr
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggpubr: 'ggplot2' Based Publication Ready Plots},
##     author = {Alboukadel Kassambara},
##     year = {2020},
##     note = {R package version 0.2.5},
##     url = {https://CRAN.R-project.org/package=ggpubr},
##   }
## 
## To cite package 'magick' in publications use:
## 
##   Jeroen Ooms (2020). magick: Advanced Graphics and Image-Processing in
##   R. R package version 2.3. https://CRAN.R-project.org/package=magick
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {magick: Advanced Graphics and Image-Processing in R},
##     author = {Jeroen Ooms},
##     year = {2020},
##     note = {R package version 2.3},
##     url = {https://CRAN.R-project.org/package=magick},
##   }
## 
## To cite package 'RColorBrewer' in publications use:
## 
##   Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package
##   version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {RColorBrewer: ColorBrewer Palettes},
##     author = {Erich Neuwirth},
##     year = {2014},
##     note = {R package version 1.1-2},
##     url = {https://CRAN.R-project.org/package=RColorBrewer},
##   }
## 
## To cite package 'shiny' in publications use:
## 
##   Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan
##   McPherson (2020). shiny: Web Application Framework for R. R package
##   version 1.4.0.2. https://CRAN.R-project.org/package=shiny
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {shiny: Web Application Framework for R},
##     author = {Winston Chang and Joe Cheng and JJ Allaire and Yihui Xie and Jonathan McPherson},
##     year = {2020},
##     note = {R package version 1.4.0.2},
##     url = {https://CRAN.R-project.org/package=shiny},
##   }